PAPA - Packed Arithmetic on a Prefix Adder for Multimedia Applications

نویسنده

  • Neil Burgess
چکیده

This paper introduces PAPA: Packed Arithmetic on a Prefix Adder, a new approach to parallel prefix adder design that supports a wide variety of packed arithmetic computations, including packed add and subtract with saturation, packed rounded average, and packed absolute difference. The approach consists of altering the prefix adder cell logic equations to take advantage of a previously unused “don’t care” state. Logical Effort is employed to assess the delay of the new adder architecture by establishing the extra effort needed to select and drive the appropriate carry signal to the requisite sum sub-word. This adder will find applications in video processors and other multimedia-orientated processor chips that impl ement packed arithmetic operations. 1. Motivation Multimedia processor chips (and others) make much use of “packed” arithmetic operations in order to accelerate a variety of digital signal processing algorithms for consumer applications. In such arithmetic units, long wordlength numbers are optionally treated as several independent shorter wordlength numbers – for example, a 32-bit word may be treated as 2 separate 16-bit words or as 4 8-bit words. The main motivation for this mode of operation is to support SIMD processing with its associated advantages in the context of a conventional pipelined load-store processor architecture [1]. Moreover, a common arithmetic operation used in video processing is “absolute difference”, denoted A–B, and used widely in video motion estimation and prediction algorithms. Hence, a most valuable operation is a “packed absolute difference” operation, which returns the absolute differences of a several independent pairs of 8-bit pixel values simultaneously. Ordinarily, absolute differences are computed either by performing a subtraction operation followed by a separate “absolute value” operation, which returns the magnitude of a signed number, or by performing a comparison to order the operands followed by a subtraction in which the smaller operand is subtracted from the larger [2]. Instead of such two-step implementations, absolute differences can be obtained by computing both A–B and B–A, and using the signs of the two results to select the positive result [3]. However, this is wasteful and a better technique is sought. Recently, some authors have des cribed how absolute differences can be derived using a single prefix adder [4, 5], but have not extended this insight to packed arithmetic. Previously reProceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP’02) 1063-6862/02 $17.00 © 2002 IEEE ported implementations of packed arithmetic prefix adders include [6, 7], but neither of these proposals is able to support packed late increment operations, vital for computing absolute difference and rounded average instructions. A second valuable arithmetic option for media applications is saturated arithmetic, in which overflows and underflows do not cause exceptions but rather return pre-defined “saturation constants” [8]. Such constants should ideally be incorporated with little or no performance overhead since the integer adder is typically on the critical path that defines a processor’s clock rate. This paper describes how a prefix adder can be altered straightforwardly to support packed absolute difference and rounded average instructions as well as packed saturated arithmetic, and further describes how Logical Effort was employed to assess its performance potential. 2. Packed arithmetic on a parallel prefix adder 2.1 Prefix tree cell logic The parallel prefix carry-lookahead adder is a popular VLSI design technique that accelerates an n-bit addition by means of a parallel prefix tree [9]. A block diagram of a prefix adder is illustrated in Figure 1, where the adder is seen to consist of three blocks: input bit propagate, generate, and not kill cells; the prefix tree; output sum cells. The input cells derive the bit propagate, generate, and not kill signals respectively according to: p(i) = a(i) ⊕ b(i) ____ (1a) g(i) = a(i) ∧ b(i) ____ (1b) ¬k(i) = a(i) ∨ b(i) ____ (1c)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Imprecise Minority-Based Full Adder for ‎Approximate Computing Using CNFETs

   Nowadays, the portable multimedia electronic devices, which employ signal-processing modules, require power aware structures more than ever. For the applications associating with human senses, approximate arithmetic circuits can be considered to improve performance and power efficiency. On the other hand, scaling has led to some limitations in performance of nanoscale circuits. According...

متن کامل

A Low Power Full Adder Cell based on Carbon Nanotube FET for Arithmetic Units

In this paper, a full adder cell based on majority function using Carbon-Nanotube Field-Effect Transistor (CNFET) technology is presented. CNFETs possess considerable features that lead to their wide usage in digital circuits design. For the design of the cell input capacitors and inverters are used. These kinds of design method cause a high degree of regularity and simplicity. The proposed des...

متن کامل

Design of Efficient Han-Carlson-Adder

In digital VLSI systems binary addition is the most significance arithmetic function. To a great extent adders are used as DSP lattice filter where the ripple carry adders are substituted by the parallel prefix adder to reduce delay. The requirement of adder is that it is fast and it has area efficient and low power consumption. In this the parallel prefix adder is introduced as speculative Han...

متن کامل

Design and Implementation of Rns Reverse Converter Using Parallel Prefix Adders

In this paper, the implementation of residue number system reverse converters based on hybrid parallel prefix adders is analyzed. The parallel prefix adder provides high speed and reduced delay arithmetic operations but it is not widely used since it suffers from high power consumption. Hence, a hybrid parallel prefix adder component is presented to perform fast modulo addition in Residue Numbe...

متن کامل

A High-Speed Dual-Bit Parallel Adder based on Carbon Nanotube ‎FET technology for use in arithmetic units

In this paper, a Dual-Bit Parallel Adder (DBPA) based on minority function using Carbon-Nanotube Field-Effect Transistor (CNFET) is proposed. The possibility of having several threshold voltage (Vt) levels by CNFETs leading to wide use of them in designing of digital circuits. The main goal of designing proposed DBPA is to reduce critical path delay in adder circuits. The proposed design positi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002